o 

(N 



> 



The capacity of a quantum channel for simultaneous 
transmission of classical and quantum information 



I. Devetak 

IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 



O 
O 

<N : p. W. Shor 



Department of Mathematics, MIT, Cambridge, MA 02139 
February 1, 2008 



Abstract 

An expression is derived characterizing the set of admissible rate pairs for simultaneous 
transmission of classical and quantum information over a given quantum channel, generalizing 
^■f-j , both the classical and quantum capacities of the channel. Although our formula involves 

regularization, i.e. taking a limit over many copies of the channel, it reduces to a single-letter 
expression in the case of generalized dephasing channels. Analogous formulas are conjectured 
for the simultaneous public-private capacity of a quantum channel and for the simultaneously 
l-way distillable common randomness and entanglement of a bipartite quantum state. 

O 



1 Introduction 



pH . In the paper that marked the beginning of information theory C. E. Shannon introduced the 

d \ notion of a (classical) channel VF, a stochastic map modeling the effect of noise experienced by a 

^ ■ classical message on its way from sender to remote receiver. There he defined and computed the 

key property of the channel W: its capacity C{W) to convey classical information, expressed in 
bits per channel use. Many decades later, in the context of quantum information theory, the notion 
of a quantum channel A/", a cptp (completely positive trace preserving) map, was introduced as 
■ the most general bipartite dynamic resource consistent with quantum mechanics. There are now 

' two basic capacities one may define for M: classical C{N) and quantum Q{M). Intuitively, these 

correspond to the maximum number of bits (respectively qubits) per use of A/" that can be faithfully 
transmitted over the channel. The classical capacity theorem was independently proved by Holevo 
|17|. and Schumacher and Westmoreland '23'. The quantum capacity theorem was originally stated 
by Lloyd jl9| . although it was only recently generally realized that his proof could be made rigorous 
P^. It has also been proved by Shor |2S] and subsequently, via the private classical capacity, by 
Devetak |S] . In the present paper we unify the two capacities by investigating the capacity of N 
for simultaneously transmitting classical and quantum information, given in the form of a trade-off 
curve. 

Let the sender Alice and receiver Bob be connected via a quantum channel M : TLa' —^'Hb, 
where Ha' denotes the Hilbert space of Ahce's input system A' and Ha' that of Bob's output 
system B. We shall define three distinct information processing scenarios which will turn out to 
be equivalent. 

Scenario la (subspace transmission) Alice's task is to convey to Bob, in some large number 
n uses of the channel, one of fi equiprobable classical messages with low error probability and 
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simultaneously an arbitrarily chosen quantum state from some Hilbert space Ti. of dimension n 
with high fidelity. More precisely, we define a (classical, quantum) channel code to consist of: 

• An ordered set [£m)m & [a*], [m] — {li2, . . ./i}, of cptp maps £,„ : Ha" T-LJ^ . Such an 
ordered set the most general function with two inputs, classical and quantum, and one quantum 
output. 

• A decoding quantum instrument ^ D = (2?m)me[M]' ordered set of cp (completely positive) 
maps Vm '■ Tif " Hb', the sum of which V ~ X]m6[^i] -^m trace preserving. The probability of 
outcome m for input p is Trl?m(p), while the effective quantum map is P. The instrument has one 
quantum input and two outputs, classical and quantum. It is a natural generalization of a POVM 
(positive operator valued measure), which cares only about the classical output, and quantum cptp 
map, which only has a quantum output. 

Alice's classical message is represented by a random variable M uniformly distributed on the 
set [fi]. Conditional on M taking on a particular value m, Alice encodes the quantum state of 
A" with £m and sends it through n copies of the channel J\f. Bob performs the instrument D on 
the channel output, resulting in the classical outcome random variable Af' and a quantum output 
system B' . Note that Ha" — T^-b' = "H. We call the ordered pair ((£m)m, D) an (n, e) code if 

1. Pr{M' ^ m\M = m} < e, Vm, 

2. min F(^, (X>o7V®"o£„)((^)) > 1-e, Vm, 

where the fidelity is defined by F{p,a) = \\\/p\/^\\i- Condition 1 above means that each message 
should be correctly decoded by Bob with high probability. Condition 2 corresponds to the subspace 
transmission criterion of ^J: each pure input state \(j)) supported on Ti. should be almost perfectly 
transmitted to Bob. The (classical, quantum) rate pair of the code is (r, i?), with r = ^log/x and 
R — ^ log K. They represent the number of bits and qubits, respectively, per use of the channel 
that can be faithfully transmitted simultaneously. A rate pair (r, R) is called achievable if for all 
€,S > and all sufficiently large n there exists an {n, e) code with rate pair {r — 6, R — 6). The 
simultaneous (classical, quantum) scenario la capacity region of the channel S'ia(A/') is the set of 
all achievable positive rate pairs. 



Scenario lb (entanglement transmission) This scenario is very similar to the first one, but 
instead of transmitting an arbitrary pure state of A" , Alice is required to preserve entanglement 
^ between A" and some reference system A she has no access to. Here condition 2 is replaced by 

2'.F($,a„) > 1-e, Vm, 

where 

ni""' = [1-^® (i?oAA«^"o£„)](ci>^^"), (1) 

and 

m = J^.J2\^)®\k) (2) 

fc=i 

is the standard maximally entangled state on Ti®?^. We denote the corresponding capacity region 
by 5ib(AA). 



Scenario II (entanglement generation) In this scenario, simultaneously with transmitting 
classical information, Alice wishes to generate entanglement |5j shared with Bob rather than pre- 
serving it as in scenario lb. Alice prepares, without loss of generality, a pure bipartite state 
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|T„j)"^'^ in her lab {TCa'^ '■— "^a")' depending on the classical information m, and sends it 
through the channel. Bob decodes as above, yielding the output state 

ni^' - [1^® (I?oAA««)](T„^^'"), (3) 

shared by Alice and Bob. Everything else is defined as in scenario lb. The corresponding capacity 
region is denoted by Su{J^). 

In the next section we state our main result, a unique expression for the capacity regions defined 
above, investigate its properties and relate it to previous work. The proof of our main theorem is 
relegated to section 3. Some remarks on related problems are collected in section 4. We conclude 
in section 5 with suggestions for future research. 



2 Main result 

Recall the notion of an ensemble of quantum states E = {px, {(px)^^ }■ the quantum system 
A A' is in the state Icpx)^"^ with probability p^- The ensemble E is equivalently represented by a 
classical-quantum system |1(J| XAA' in the state 

^Px\x){x\^ \(f>x) {(l)x\'^'^ ■ 

X 

X plays the dual role of an auxiliary quantum system in the state '^xP^\^)i-''\ ^^'^ '-'^ ^ random 
variable with distribution p. Sending the A' system through the channel Af gives rise to a classical- 
quantum system XAB in some state a-^^^: 

^XAB ^ ^ [(1 ® AA)(|0,)(0,|)]^^. (4) 

X 

For such a state we say that it "arises from" the channel A^. For a multi-party state such as 
fjXAB reduced density operator is defined by TrxsO'^^'^- Conversely, we call a^^^ an 
extension of . A pure extension is conventionally called a purification. Define the von Neumann 
entropy of a quantum state p by H{p) = — Tr (plogp). We write H{A)„ = H{a^), omitting the 
subscript when the reference state is clear from the context. The Shannon entropy — J^xP^^'^SP^ 
of the random variable X is equal to the von Neumann entropy H{X) of the system X. Define 
the conditional entropy 

H{A\B) = H{B) - H{AB), 

(quantum) mutual information 

/(A; B) ^ H{A) + H{B) - H{AB), 

and conditional mutual information 

I{A; B\X) = I{A; BX) - I{A\ X). 

The coherent information I{A)B) is defined as —H{A\B). Whenever the state p^^ comes about 
by sending some pure state \4')^^ through the channel A/", we may use the alternative notation 
\n\ Ic{4>^ , A/") := I{A)B)p, since this quantity is independent of the particular purification \4>}^^ 
of (j)^' . In what follows all information theoretical quantities will refer to the state a^^^ , unless 
stated otherwise. 

Our main result is the following theorem. 
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Figure 1: A generic trade-off curve for the simultaneous (classical, quantum) capacity region (solid 
line). The dashed-dotted line represents the time-sharing inner bound. The dashed line is the outer 
bound which follows from the observation that the transmitted quantum subspacc may always be 
used to encode classical information at 1 bit/qubit. The continuation to the negative R axis (see 
text) is shown for scenario II (solid) and scenario I (dashed). 

Theorem 1 The simultaneous capacity regions of Af for the various scenarios la, lb and II are 
all equal and given by 

S{N) - U jS^^\N®^) (5) 
1=1 

where S*'^^ (A/") is the union, over all a-^"^^ arising from the channel N , of the (r, R) pairs obeying 

< r < I{X;B) 

< R < I{A)BX). (6) 

Furthermore, in computing S^^\Af) one only needs to consider random variables X defined on 
some set X of cardinality \X\ < {dimTi.A')^ + 2. 

Since the three scenarios are equivalent we shall speak of a single capacity region. The generic 
shape of the capacity region is shown in figure 1. We shall informally refer to the outer boundary 
of the capacity region in the (r > 0, i? > 0) quadrant as the "trade-off curve" . In scenarios lb 
and II, for any < A < 1, combining a (An, e) code of rate pair (ri, i?i) with a ((1 — A)n, e) code 
of rate pair (r2,i?2) one obtains an (n, 2e) code of rate pair (Ari -I- (1 — A)r2, Ai?i -I- (1 — A)i?2)- 
This construction is known as time-sharing and implies the concavity of the capacity region. In 
fact, the "single-letter" region S'-^HAf) is already concave for all channels Af (see appendix A). 
The points {C{Af),0) and {0,Q{J\f)) represent the classical and quantum capacities, respectively. 
By time-sharing one may achieve the line segment interpolating between the two, giving an inner 
bound on the capacity region. An outer bound given by the line segment connecting (C(7V),0) 
and (0,C(A/')) is obtained by observing that, in scenario la, the transmitted quantum subspace 
may always be used to encode classical information at 1 bit/qubit. 



4 




0.016 



0.8 



0.6 



r 



0.4 



0.2 







0.2 



0.4 



0.6 







0.2 0.4 



0.6 



R 



R 



Figure 2: The trade-ofF curve for the dephasing qubit channel with dephasing parameter 0.2 (i.e., 
the channel obtained by applying the identity operator with probability 0.9 and Uz with probability 
0.1). In the left-hand plot, the trade-ofF curve is plotted with a solid line and the time-sharing 
bound with a dashed line. The right-hand plot gives the difference between the optimal strategy 
and time-sharing. 

Our theorem is, alas, difficult to use in practice due to the I oo limit. Two simple examples 
in which this limit is not required are the noiseless qubit channel and the erasure channel, for 
which both C{Af) and Q{J^) were previously known In both cases the boring time-sharing 
strategy turns out to be optimal. This is particularly trivial to see for the noiseless channel: since 
Q{Af) = C{Af), the inner and outer bound coincide. 

A more interesting case is that of a dephasing channel, for which the large I limit is also not 
required (we prove this in appendix B), yet the resulting trade-off curve is strictly concave. The 
S{Af) region for the dephasing qubit channel with dephasing parameter 0.2 is shown in figure 2. 

For the depolarizing channel, another popular example, the I — *■ oo limit is known to be needed 
when the depolarizing parameter is close to p — 0.189, the value making Q{J^) — One can, 
however, make an interesting observation about the behavior of the trade-off curve near R = Q(Af). 
Although the channel itself is invariant under unitary transformations, the p that maximizes the 
coherent information Ic{p,Af) breaks this symmetry; indeed there is a whole family of density 
operators attaining Q{N'). One can thus construct an ensemble with R = Q{N) and r > 0, so 
the trade-off curve is parallel to the r axis in a finite region around r = 0. For the depolarizing 
channel with different p, we have calculated the the trade-off curve assuming / — 1 and found some 
interesting behavior. For p small {p < 0.04 or so) it is possible to do better than the time-sharing 
strategy, whereas for larger p (p > 0.05), the time-sharing strategy is optimal, assuming / = 1. For 
these values of p, it is not known whether taking large I is advantageous for Q{J\f). 

There is an intriguing connection between our capacity region and the findings of Shor |26| 
concerning the classical capacity of a quantum channel with limited entanglement assistance. The 
latter may be thought of as extending scenario II to the negative R axis, since entanglement is 
consumed rather than generated |14j . The result for the R < region parallels that from theorem 
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1, replacing © by 

< r < I{X;B) + I{A;B\X) 

R < -H{A\X)^I{A)BX)~I{A;B\X). (7) 

The two expressions on the right hand side have the same sum as in equation JBJ . There is a simple 
bijection between the two regions: If (r, R) is a point in i? > corresponding to the state a^^^ , 
then (r + I{A; B\X), R — I{A; B\X)) is a point in the R < region, and vice versa. Imagine that 
1 ebit of entanglement were a stronger resource that 1 bit of communication, in the sense that the 
latter could be produced form the former. Then the i? < region would be trivially achievable by 
the achievability of the R > region. The opposite would hold were 1 bit stronger than 1 ebit. 
However, it is well known that bits and ebits are incomparable resources. The correspondence 
between the two regions may be interpreted as providing a limited sense in which bits and ebits 
may be thought of as equally strong. 

One may play the same game in the context of scenario I (a or b), with a somewhat less 
interesting outcome. Here a negative rate R is interpreted as assistance by a noiseless quantum 
channel. It is known |25 that the classical capacity of a noiseless channel combined with a noisy 
one is just the sum of the individual capacities. Hence the scenario I continuation of our trade-off 
curve simply follows the linear outer bound into the R < region (see figure 1). 



3 Proof of theorem 1 

The following lemma from is needed to relate scenarios la and lb. 
Lemma 2 1- If 

min F(^, (X>oA/'«^"o£)(^)) > 1 - -e 
\<p)en 3 

then 

i^($,[l®(I?oAA®"o £)]($))> 1-e. (8) 
2. Conversely, if (|SJ| holds then 

min F((p, (V o 7V^" o £)(ip)) > 1 - 2e, 

where Ti.' is a subspace of Ti. satisfying 

dimTi' > idiniT^. (9) 

Observe that S'ia(A/') = S'ib(A/') C Su{Af). The equality follows from both parts of lemma |21 
The inclusion is obvious since one can always generate entanglement by transmitting half of the 
maximally entangled state |$). Therefore, to prove theorem 1 it suffices to show that the region 
(O is contained in S'ib(A/') (the "direct coding theorem") and contains Su{N') (the "converse"). 

To prove the converse we need the following simple lemma [S]. 

Lemma 3 For two bipartite states p^^ and of a quantum system AB of dimension d with 
fidelity f^F{p^^,a^B), 

\I{A)B)p-I{A)B),\ < - + 41ogdv/W. 
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Proof of theorem 1 (converse for scenario II) Define the classical-quantum state uj'^^^^ 
to be the result of sending the A'" part of 



m (8) TZr 



through the channel A/"^". We shall prove that, for any (5, e > and all sufficiently large n, if an 
(n, e) code has a rate pair (r, i?) then 

r-5 < i/(M;B")^, (10) 
n 

< i/(^)B"M)^. (11) 
n 

Evidently, it suffices to prove this for (5 < 1, e < [ igiog/im>^ and n > |. Fano's inequality [S] 

says 

H{M\M') < 1 + Pt{M' ^ M}nr. 
Equation HlOfl is a consequence of the following string of inequalities 

nr = H{M) 

= I{M;M')+H{M\M') 

< I{M;M') + l + ne log dim Ha' 

< I{M;B'') + l + ne log dimn A', 

the last line by the Holevo bound |16j. On the other hand, defining i^'^^-^s ^j^g state w*^'^^ 

after Bob's decoding V, 

/(A)B"Af)^ > I{A)B'M)^, 

> I{A)B%, 

> I{A)B')^---8nRy^ 

e 

2 _ 

> nR 8n log dim Ha' v ^, 

e 

from which the claim follows. The first inequality is the data processing inequality [2], the 
second follows from the fact that conditioning cannot increase quantum relative entropy |20| and 
the third is an application of lemma |3| It should be noted that we only used a weaker "average" 
version of conditions 1. and 2'., namely 

1. Pt{M' ^ M} < e, 

2'.F($,- Vr!™) > 1-e. 
m ^-^ 

The bound on the cardinality of X is proven in appendix C. H 

We henceforth restrict attention to scenario lb. In proving the direct coding theorem, we shall 
combine purely quantum and purely classical codes. A quantum code is a special case of a (classical, 
quantum) code defined earlier, for which ^ — \ {r — Q). Quantum codes are characterized by a 
pair of encoding and decoding maps (£,!?). Define the quantum code density operator '5' as f (tt), 
where it = . 

Often in coding theory is it useful to consider random codes. Alice and Bob have access to 
an auxiliary resource: a common source of randomness described by some probability distribution 
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(Pa)- A random quantum code is an ordered set of encoding-decoding pairs {{£" ,T>°'))a, indexed 
by a. With probability Pa, Alice and Bob choose to employ the deterministic code The 
average code density operator for the random quantum code is given by Given a 

density operator p G Ha', we say that an (n, e) random quantum code is "p-type" if the average 
code density operator uj satisfies 

Wlo - p^-\U < e. (12) 

For an ensemble of density operators E = {p^, Px} defined on Ha' and sequence x" = X1X2 ■ ■ - Xn 
denote pxr^ — ®^=iPxi- We say that an (n, e) random quantum code is x")-type" if the 
average code density operator ujx^ satisfies 

II Wx" - PxA\l < 

The following proposition is a refinement of the quantum channel coding theorem, and was 
proved in Appendix D of fBl ■ A perhaps more accessible outline of the proof may be found in ^2 ■ 

Proposition 4 For any e, J > and all sufficiently large n, there exists a random p-type (n, e) 
quantum code for the channel M of rate R — Ic{p,M) — 5. 

Recall the notion of (5-typical sequences 7^"^ 

= ■■ \N{x\x^) - npx\ < Sn} , 

where N{x\x") counts the number of occurrences of x in x". When the distribution p is associated 
with some random variable X the alternative notation 7J g may be used. Proposition Q extends 
to: 

Proposition 5 For any e,S > and all sufficiently large n, for any typical sequence a;" G there 
exists a random (E, x"')-type (n, e) quantum code for the channel Af of rate R = '^^PxIc{PxtJ^) — 
cS, for some constant c. 



Proof By proposition^] for sufficiently large n, for all x there exists an {n[px — S],e) code of 
rate Rx — Ic{Px,J^) — with average density operator lUx satisfying 

lk.-pf*-"'l||i <e. 

By "pasting" \X\ such codes together (one for each x) an (n — lA'jcJ, jA'le) code is produced with 
average code density operator u — lo^- Applying the triangle inequality multiple times, 

\\^-(^pf^lp^-^]\\,<\X\e. (13) 

X 

Given a;" € TJ^ g, abbreviate rix = N{x\x"') and Aux = rix — n[px — S]. Now transform the above 
code into the "padded" (n, \X\e) quantum code obtained by inserting after each ujx', its 

average density operator cu' obeys 

||c^'-(g)pri|i<|A'|e. (14) 

X 

The new rate R is bounded by 

R ^Y^R^p, -S]> Y^pMPxM) - + \X\ log dim Ha')- 
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Finally, as pf and are related by a permutation of the channel input Hilbert spaces and 
the channel A/"**" is invariant under such permutations, there exists an (n, \ X\e) code of the same 
rate R and average code density operator uj^^ such that 



On the classical side, we shall need the Holevo-Schumacher- Westmoreland (HSW) theorem 
^11^, or rather its "typical codeword" version [5]. Consider the restriction of cr-^^^ (0} to XB: 



Proposition 6 (HSW Theorem) For any e,6 > 0, define r = B) — c'5, for some constant 
c', and fi = 2"'^. For all sufficiently large n, there exists a classical encoding map f : [fj] ^ Tx s 
and a decoding POVM A = (A„i)„g[^], such that 

TlTmA-rn > 1 - 6, Vm G [fl], 

where 

and (t)^T =(8)r=i<^x-- 

Proposition says that Bob may reliably distinguish among p states of the form A/'^"((/);J?n"), 
with cc" S 7J 5 . The idea behind the proof of the direct coding theorem is for Alice to use a different 
quantum code depending on the classical message to be sent. Bob first decodes the classical 
message (while causing almost no disturbance to the quantum system) by taking advantage of the 
distinguishability of the channel outputs for the different codes. Furthermore, the same information 
tells him which quantum decoding to perform! Thus, the classical information has been "piggy- 
backed" on top of the quantum information. 

Proof of Theorem 1 (coding for scenario lb) Recall, in scenario lb Alice is transmitting half 
of the maximally entangled state |$) through the channel. Define /x, /, and A as in proposition 
For now we shall assume Alice and Bob have access to a common source of randomness with 
distribution {Pa)- For each m define a {{px, 4>x }j /(™))-type {n, e) random quantum code of rate 
R — I{A)BX) — c5 by the encoding and decoding operators {{£^,'D"^))a- By proposition |31 and 
monotonicity of trace distance |2(J| we have, for all m and sufficiently large n, 

\\J2P-rZ-rm\\i<e, 

Oi 

where t'^ = (AA»" o £^){tt). By propositionEl 

^P„TrT'^A„> l-2e. (15) 

a 

For a specific value of a, the encoding map for our (classical, quantum) code is given by (^"Jmef/j]- 
The decoding instrument D" is given by 
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As usual, V" — J2m.'^m denotes the induced quantum decoding operation. By (fT5|) . for all m, 

^F„p,"Jf<2e, (16) 

a 

where p^^" = 1 — TrX>^(r'^). Defining an extension of r'^ 

C = [l®(A/'^"oC)](|ci>)(ci>|), 

it follows from (|15(l that 

^P„TrC(l'»A„) > l-2e. 

a 

Invoking the gentle measurement lemma |29| and the concavity of the square root function, 

a 

which by the monotonicity of trace distance [5nj gives 

^pji(i ® i?„o(a) - (1 ® 2?:j(e^)iii < 4v^- 

On the other hand, 

||(l®P)(C)-(l®2?™)(C)lli< E l|25™(C)lli<2e. 

Since, for all m, a, 

P((l®2?':)(0,<i>)>l-e, 
putting everything together gives, for all m, 

E^"^e"r ^3e + 4Vi (17) 

where P™;" = 1 - P((l ® 2?")(a), *)• 

At this point our code relies on Alice and Bob having access to the common random index a. 
To prove the theorem it remains to "derandomize" the code, i.e. show that p^f and P™r'" are 
small for a particular value of a, and for m in a sufRciently large subset of [fj]. By (fTB|l and lfT7|l . 



There exists a particular a for which 



Fixing a, expurgate the worst half of the codewords, i.e. those m with the highest value of 
Peri" + -ferr'"- ^ow wB havc a code with both p'^f and Pg™'" bounded from above by lOe + 8^/e 
for all remaining to, while the classical rate has only decreased by -. This concludes the proof. ■ 
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4 Remarks on related problems 



The first remark we make concerns replacing th.e classical-quantum dichotomy with the crypto- 
graphically relevant public-private one. In |H] quantum codes were built based on private informa- 
tion transmission ones. The purpose of the latter is for sending classical information about which 
the potential eavesdropper (to which the "environment" of the channel is granted) cannot learn 
anything. This should be contrasted with HSW codes which may be viewed as transmitting public 
information. One may now consider the problem of finding the simultaneous (public, private) 
capacity of Af. The answer follows in a straightforward manner from the methods of |H] and those 
used in proving theorem 1 . Viewing the channel Af as being embedded in an isometry Uji^ with an 
enlarged target Hilbert space, C/a/ : Ha' ^ Ti-B ®'He (Ji-E is now given to the eavesdropper), the 
simultaneous (public, private) capacity region is given by the following modification of theorem 1: 

• replace the state a-^^^ by a^^^ , obtained by sending the A' part of 

xy 

through the channel, 

• replace I{A)BX) by I{Y;B\X) - I{Y;E\X). 

The corresponding theorem for classical "wire-tap" channels was proven in ^ . 

Secondly, one may conceive of a "static" analogue of the problem considered here, where Alice 
and Bob share many copies of some (mixed) state p"^^ instead of being connected by a quantum 
channel. In jlO) the problem of generating common randomness (perfectly correlated bits) from 
such a resource using limited forward (Alice to Bob) classical communication was considered. There 
the "distillable common randomness" (DCR) was defined to be the maximum common randomness 
obtainable in excess of the classical communication invested, and was advertised as an (asymmet- 
ric) measure of the classical correlations in p"^^. In the problem of one-way entanglement 
distillation was solved, yielding a similarly asymmetric measure of quantum correlations in p"^^. 
The next step is to unify the two results in a trade-off between DCR and distillable entanglement, 
which could now be argued to quantify the total correlations in the state. Based on the results 
of 501 J 53 and the present paper we put forth the following conjecture: The simultaneously dis- 
tillable (classical, quantum) resources are given precisely by theorem 1, where now the test states 
(jXAB g^j.g obtained by applying general instruments D = {T>x)xex to the A part of p"^^, rather 
than arising from a channel. A sketch of the proof is as follows. The coding strategy involves 
double blocking. First use the protocol of 501 on a block of length n to establish a good approxi- 
mation to X" on Bob's side using « nH{B\X) bits of forward communication. This already gives 
us the desired DCR rate of I{X;B). Now that Bob's system includes X" they may use further 
blocking to distill entanglement at a rate of I{A)BX) jTJ. The classical communication involved 
in this distillation has now turned into common randomness, effecting no net change in the DCR. 
The converse theorem is left as an exercise. A somewhat more ambitious goal would be to include 
the classical communication cost in the trade-off, giving a 3-dimensional region! 

The final remark we make is that the "piggy-backing" idea used in the proof of theorem 1 
provides an alternative coding strategy to the one in |201 for the classical capacity oiAf with limited 
entanglement assistance, thus establishing an additional connection between the two problems. 
The original paper on the entanglement assisted capacity W describes how to achieve the pair 
(r, R) = {I{A; B)p, -H{A)p), for some p^^ = {1^ (g) Af){(j)'^^') arising from the channel. Using a 
mixture of codes corresponding to different channel inputs \(f>x)^^ , one trivially achieves (r, i?) = 
{I{A- B\X),~H{A\X)) (with respect to w^^^). As it turns out. Bob may use the distinguishability 
of the channel outputs of different such code mixtures to send extra classical information at a rate 
of I{X; B). This gives the region (jj)). A detailed version of this argument will appear in |0]. 
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5 Discussion 



In conclusion, an information theoretical characterization of the simultaneous (classical, quantum) 
capacity region has been derived. The key idea was to use a different quantum code depending on 
the classical information to be sent, thus "piggy-backing" the classical information on top of the 
quantum one. The formula derived requires optimization over potentially arbitrarily many copies 
of the channel. We have shown that for a generalized dephasing channel a single copy suffices. We 
have also presented some ideas on cryptographic as well as static analogues of this problem. 

We have already mentioned the open problem of including the classical communication cost in 
the trade-off for the static analogue. Another interesting extension of our work, which in fact served 
as our original motivation, is the following joint source-channel coding problem. In ^S] the task 
of quantum compression with classical side information was considered. This is a "visible" source 
coding problem of a pure-state ensemble E. By storing partial information about the identity of 
the states (classically) at a rate C it is possible to reduce the quantum storage rate to some value 
Q{C). The joint source-channel coding variant of this problem is: Given E and a channel A^, 
what is the rate at which Alice can send the quantum part of the ensemble over the channel? One 
approach is to first separate the source into a classical and quantum part using the trade-off of |15| 
and then send them simultaneously through the channel using the trade-off of theorem 1. This 
procedure is optimized over the ratio A of the classical and quantum rates which should coincide 
for the source and channel coding part. There are, however "well matched" source-channel pairs 
for which such a strategy is known to be suboptimal. The following example is due to Smolin |27j . 
The source is the equiprobable "trine" ensemble (|0), |e+), |e^)), where le"^) ~ ^|0) ± -^11) and 
the channel A/" : 7^3 — > H2 has operation elements {|0)(0|, |e+)(l|, |e^)(2|}. The channel has no 
quantum capacity and a classical capacity of 1. Our strategy of separating the source and channel 
coding gives a source-channel capacity of 1/ log 3 transmitted copies of the ensemble per channel 
use. On the other hand, by simply feeding the identity of the state to the channel one achieves 
a source-channel capacity of 1. Finding a solution for an arbitrary {E,Af) pair remains an open 
question. 

Acknowledgments We thank Charles Bennett, Aram Harrow and John Smolin for useful 
discussions. ID is partially supported by the NSA under the ARO grant numbers DAAG55-98-C- 
0041 and DAAD19-01-1-06. 



A Proof of concavity of S^^\jV) 



Here we provide a proof that the region S^^''{J\f) defined by © is concave. Let ctq"^^ and 175''^"^'^ 
be two different states arising from the channel. For some A between and 1, consider the state 

^UXAB ^ ^ ^ ^XAB + (1 _ A) |1)(1|^ ® 

which also arises from the channel. Then 

XI{X;B),, + il-X)I{X;BU < I{UX;B), 
\I{A)BX)„, + {l~\)I{A)BX),, - I{A)BUX)^, 

from which the claim follows. 
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B The capacity region for dephasing channels 



In this section we define the notion of degradable channels and show that for such channels the 
quantum capacity is given by the single-letter formula 

Q{M) = Q^^\N) max/(A)B), (18) 

pAB 

where the maximization is over all states p"^^ arising from the channel M . For the special case of 
dephasing channels we shall prove that the entire trade-off curve can be single-letterized. 

Recall that a channel M : Ha' — » Tis can be defined by an isometric embedding LOv : Ti-A' 
Hb^'He, followed by a partial trace over the "environment" system E, so Af{p) — TrEUj^{p) |28j . 
This further induces the complementary channel J\f'^ : T-La' —^Ti-E defined by J\f'^{p) = TTBU^f{p). 
We call a channel J\f degradable when it may be degraded to its complementary channel A/''^, i.e. 
when there exists a map T : He ^ "He so that A/"^ = T oM. 

To see that Q{N) = Q^^\N) for degradable channels, note that Bob's output system B may 
be mapped by a fixed isometry onto a composite system B' E' such that the channels from A' to 
E' and to E are the same. Thus, for any state arising from the channel, 

I{A)B) = H{B) - H{E) 

= H{B'E') - H{E) 
= H{B'E')~ H{E') 
= H{B'\E'). 

We can then use the inequality [211 

HiB[B'^\E[E'^) < H{B[\E[)+H{B',\E'^) 

to prove that single-letter maximization already achieves Q{Af). 

A subclass of degradable channels of particular interest are generalized dephasing channels. The 
latter are defined on some d-dimcnsional Hilbert space with a preferred orthonormal basis 
such that all states belonging to this basis are transmitted without error, but pure superpositions 
of these basis states may become mixed. This implies that if A/" is a dephasing channel then its 
isometric embedding obeys 

where the are generally not mutually orthogonal. When the are mutually orthogonal M 
is the completely dephasing channel A,^: 

d 

^d{p)^Y.\'){^\p\i)(^\■ 

i=l 

It is clear from the above that any dephasing channel A/" obeys 

• A<joAA = AAo Ad = Arf 

• 7V^ o Arf = A/''= 

Every dephasing channel is degradable, since M may be degraded to A^; which may be further 
degraded to M'^ . In fact, the map T can be taken to be N". Therefore, Q{U) = Q^^'>{U) In 
what follows, the special properties of dephasing channels will allow us to prove an even stronger 
statement: that the outer boundary of S{Af) may be expressed as a single-letter formula. 



13 



Consider some state o-^^^^ arising from the channel. Bob may degrade his channel further by 
replacing his system B by B'Y, where Y now contains the completely dephased version of B (this 
is why we label it as a classical system) . Set A > 1 and define 

hiJ\f) = max [HiY) + (A - 1) H{Y\X) - XH{E\X)] , 

where the maximization is over all a-'^^^ arising from the channel (cr"^^^ is implicit in the en- 
tropies). We shall make use of the following lemma. 

Lemma 7 For two general dephasing channels J\f\ and M2 

/a(M®AA2)=/a(M) + /a(AA2) 

Proof The ">" direction follows from the fact that the input ensemble for A/i (8)^/2 may be chosen 

to be a tensor product of the ones maximizing f\{Mi) and f\{Af2)- To show the opposite inequality, 
in what follows let us refer to the state a-^YiY2EiE2 ^j-^g^^ maximizes f\{Afi ^Af2)- Observe that 

HiYiY2) = H{Yi) + H{Y2\Y^) 

H{Y^Y2\X) = H{Y^\X) + H{Y2\Y^X) 

and 

H{EiE2\X) = H{Ei\X) + H{E2\EiX) 

< H(Ei\X) + H{E2\YiX), 

the latter since Ei contains a degraded version of Yi for all values of X. Hence 

fx{^fl<^^^2) = H{YiY2) + {X-l)H{YiY2\X)-XH{EiE2\X) 

< H{Yi) + {\-l)H{Yi\X)- XH{Ei\X) 

+ H{Y2\Yi) + (A - 1) H{Y2\XYi) - XH{E2\XYi) 

< fx{^fl) + fx(.f^2)■ 



We shall use Lagrange multipliers to calculate S{J\f). By theorem 1, the quantity to be maxi- 
mized is 

I{X;B) + XI{A)BX), 

over all states cr that arise from A/"*^". Operationally it is clear that we should restrict attention 
to A > 1, since —A is the the slope of the boundary of S{Af) and a qubit channel may always be 
used to send classical bits at a unit rate. For any such state we have 

I{X;B) + XI{A)BX) = H{B) + {X - 1)H{B\X) - XH{E\X) 

< H{Y) + (A - 1)H{Y\X) - XH{E\X) 

< nfx{N). 

The first inequality follows from the fact that complete dephasing increases entropy, and is satu- 
rated by completely dephasing the input to A/"*^" (recall that N commutes with A^). The third 
inequality is by lemma 6. Thus, for dephasing channels, S{M) = S^^\j\f), which makes the 
optimization problem tractable. 

We now turn to the particular case of the qubit p-dcphasing channel 

7V= (1-p) I2+PA2. 
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It is easily checked that the outer boundary of of S{N') is achieved by the /x-parametrized family 
of ensembles, /x e [0,1/2], consisting of diag(/i, 1 — /i) and diag(l — /U, yu) chosen with equal 
probabilities. The trade-off curve is given by 

(r, R)=(l- /i2(m), h2{ti) - h2 (1/2 + l/2Vl-16p(l-p)Ml-M))) , 

where /i2(m) = ~Mlog2 jJ, — {I — n) log2(l — n) is the binary entropy function. Figure 2 shows this 
curve for p = 0.2. 



C Proof of the cardinality bound 

Here we justify the condition on the cardinality of X in the statement of theorem 1. Caratheodory's 
theorem states that in a f-dimensional Euclidean space, each point of a connected compact set /C 
can be represented as a convex combination of at most t + 1 points in /C. Let !F{H) be the family 
of all density operators on the Hilbert space Ha' of dimension d. Let /C be the image of J^{H) 
under some continuous mapping / defined by /(p) = (/i(p), . . . , ft{p))- As T{H) is connected and 
compact, so is /C. Then for any probability measure p on the algebra of density operators of Ha', 
Caratheodory's theorem implies the existence of some finite ensemble {pxi Px '■ x & A"}, \X\ =t + l, 
such that 

/ IJ-{dp)fj{p) = y'Pa;/j(Pa;), Vj G [t]. 

Turning to our problem, the quantities I{X:B) and I(^A)BX) depend solely on the ensemble 
E = {pj;,pa;}, where px ■= (j)x > and the channel Af. Moreover, they only depend on the vector 
J2xPxf{Px), where the vector valued function / is defined so that fi, ■ ■ ■ , fcp-i are the d? — \ 
degrees of freedom of p (linear in p), fd^ip) = H{J\f{p)) and fdp+i{p) = Ic{p,J^)- Suppose that 
a particular point in S'^^^M) corresponds to some ensemble E' = {fi{dp),p}. The above implies 
that the same point is achievable by a finite ensemble with at most d^ + 2 elements. 
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